Generate (Buffered)
POST/generate
Generate (Buffered)
The /generate endpoint is used to communicate with the LLM. Use this endpoint when you want to receive a full response from the LLM, all at once. If you want your response to stream token by token, See the /generate_stream endpoint.
To send a batch of requests all at once, the text field can be either a string, or an array of strings. This server also supports dynamic batching, where requests in a short time interval are processed as a single batch.
Request​
- application/json
Body
required
- MOD1
- MOD2
Array [
]
consumer_group stringnullable
json_schema nullable
max_new_tokens int64nullable
min_new_tokens int64nullable
no_repeat_ngram_size int64nullable
prompt_max_tokens int64nullable
regex_string stringnullable
repetition_penalty floatnullable
sampling_temperature floatnullable
sampling_topk int64nullable
sampling_topp floatnullable
text
object
required
oneOf
string
string
Responses​
- 200
- 400
- 422
- 503
Takes in a JSON payload and returns the response all at once.
- application/json
- Schema
- Example (from schema)
Schema
- MOD1
- MOD2
Array [
]
text
object
required
oneOf
string
string
{
"text": "string"
}
Bad request
Malformed request body
The server is not ready to process requests yet.
Loading...